Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • new variable*based on existing data

    Hi,

    I have a variable "tot_alcohol" which is the total amount of daily alcohol consumption. I want to create a new variable called "heavy_drink" as individuals whose daily alcohol intake is in the upper 20th percentile of tot_alcohol. Can someone help me with this?

    Thanks!

  • #2
    Code:
    centile tot_alchol, centile(80)
    gen heavy_drink = (tot_alcohol >= r(c_1))

    Comment


    • #3
      No example data here, so some technique shown on an accessible sandbox:

      Code:
      . sysuse auto, clear
      (1978 Automobile Data)
      
      . _pctile mpg, p(80)
      
      . ret li
      
      scalars:
                       r(r1) =  25
      
      . scalar p80 = r(r1)
      
      . count if mpg < p80
        55
      
      . count if mpg == p80
        5
      
      . count if mpg > p80
        14
      
      
      . di (55 + 2.5) / 74
      .77702703
      
      . gen himpg = mpg > p80 if mpg < .
      Note that all percentiles, like all sample statistics, must be considered estimates subject to divisibility of sample size and ties and other quirks in the data.

      All that said, you're making a measure of personal risk dependent on what is true of other people. My personal risk from say jumping off a cliff is quite independent of how many other people are prone to jump off a cliff. (I am not in practice that way inclined, but I am not going to post about my real risks.)

      See e.g. http://biostat.mc.vanderbilt.edu/wik.../CatContinuous for that and many other pertinent arguments.
      Last edited by Nick Cox; 16 Nov 2018, 11:44.

      Comment


      • #4
        Originally posted by Scott Merryman View Post
        Code:
        centile tot_alchol, centile(80)
        gen heavy_drink = (tot_alcohol >= r(c_1))
        Thanks Scott. When I did that my heavy_drink variable only had values of 0 and 1, when I wanted it to have the actual values of alcohol consumption.

        Comment


        • #5
          So, do you want

          Code:
          gen wanted = cond(total >= r(c_1), total, 0)
          or something else yet again? You need to be more specific.

          Comment


          • #6
            So I have a variable called gm_tot_alcohol, which is the total amount of alcohol consumed daily by a sample of men. I want to make a new varaible called heavy_drink that only has the values of the top 20th percent from gm_tot_alcohol.

            here is the data

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input int id byte(case age) float tobacco_amt int(gm_tot_alcohol heavy_alcohol) float alcohol_14 byte age_group float(tobacco_group p) byte unique float(count dx2 rstandard dx2_pos dx2_neg dbeta)
            549 0 62 17.5 0 0          0 3 2   .09022384 0 369   .5102295  -.7143036         .   .5102295     .01478875
            673 0 62   25 0 0          0 3 2   .09022384 0 421   .5102295  -.7143036         .   .5102295     .01478875
            222 0 28   60 0 0          0 1 3  .017411053 1 185 .017783912 -.13335633         . .017783912 .000064574386
            674 0 63    0 0 0          0 3 1   .04849342 0 421   .7884092   .8879241  .7884092          .    .024348766
            345 0 78    . 0 0          0 4 .           . 0 265          .          .         .          .             .
            543 0 48    0 0 0          0 2 1   .02860103 0 367    5.35276  2.3136032   5.35276          .     .06660466
             58 0 52    0 0 0          0 2 1   .02860103 1  56    5.35276  2.3136032   5.35276          .     .06660466
            477 0 45 12.5 0 0          0 2 2   .05418817 0 337  .17405827  -.4172029         .  .17405827   .0022076338
            598 0 55   25 0 0          0 3 2   .09022384 0 389   .5102295  -.7143036         .   .5102295     .01478875
            735 0 68    0 0 0          0 4 1   .06198522 0 444  1.6761652   1.294668 1.6761652          .     .03901731
            431 0 65  7.5 0 0          0 4 1   .06198522 1 320  1.6761652   1.294668 1.6761652          .     .03901731
            390 0 36 17.5 0 0          0 1 2  .006490329 0 293  24.043016   4.903368 24.043016          .      .1711064
            446 0 69    0 0 0          0 4 1   .06198522 0 323  1.6761652   1.294668 1.6761652          .     .03901731
            447 0 28 12.5 0 0          0 1 2  .006490329 0 323  24.043016   4.903368 24.043016          .      .1711064
            678 0 30    0 0 0          0 1 1  .003345979 0 421 .023606766 -.15364493         . .023606766  .00010676098
            512 0 64 12.5 0 0          0 3 2   .09022384 0 356   .5102295  -.7143036         .   .5102295     .01478875
            522 0 35    0 0 0          0 1 1  .003345979 0 358 .023606766 -.15364493         . .023606766  .00010676098
             31 0 61    0 0 0          0 3 1   .04849342 1  30   .7884092   .8879241  .7884092          .    .024348766
            913 1 47  2.5 0 0          0 2 1   .02860103 0 546    5.35276  2.3136032   5.35276          .     .06660466
            638 0 56    0 0 0          0 3 1   .04849342 0 407   .7884092   .8879241  .7884092          .    .024348766
            470 0 36  2.5 0 0          0 1 1  .003345979 0 333 .023606766 -.15364493         . .023606766  .00010676098
            858 1 64   45 0 0          0 3 3   .21197526 1 514   3.802489  1.9499972  3.802489          .     .08689924
            679 0 60  2.5 0 0          0 3 1   .04849342 0 421   .7884092   .8879241  .7884092          .    .024348766
            358 0 55 12.5 0 0          0 3 2   .09022384 1 272   .5102295  -.7143036         .   .5102295     .01478875
            188 0 49    0 0 0          0 2 1   .02860103 0 160    5.35276  2.3136032   5.35276          .     .06660466
            113 0 61    0 0 0          0 3 1   .04849342 0 101   .7884092   .8879241  .7884092          .    .024348766
            289 0 35    0 0 0          0 1 1  .003345979 0 231 .023606766 -.15364493         . .023606766  .00010676098
            393 0 48   25 0 0          0 2 2   .05418817 0 294  .17405827  -.4172029         .  .17405827   .0022076338
            621 0 60    0 0 0          0 3 1   .04849342 0 398   .7884092   .8879241  .7884092          .    .024348766
            351 0 48 12.5 0 0          0 2 2   .05418817 1 270  .17405827  -.4172029         .  .17405827   .0022076338
            125 0 44    0 0 0          0 1 1  .003345979 1 112 .023606766 -.15364493         . .023606766  .00010676098
            102 0 51    0 0 0          0 2 1   .02860103 0  91    5.35276  2.3136032   5.35276          .     .06660466
            894 1 68  7.5 0 0          0 4 1   .06198522 0 535  1.6761652   1.294668 1.6761652          .     .03901731
            490 0 55  7.5 0 0          0 3 1   .04849342 0 343   .7884092   .8879241  .7884092          .    .024348766
            878 1 56  2.5 0 0          0 3 1   .04849342 0 527   .7884092   .8879241  .7884092          .    .024348766
            514 0 38    . 0 0          0 1 .           . 0 356          .          .         .          .             .
            258 0 40 12.5 0 0          0 1 2  .006490329 0 212  24.043016   4.903368 24.043016          .      .1711064
             32 0 57  7.5 0 0          0 3 1   .04849342 0  30   .7884092   .8879241  .7884092          .    .024348766
            122 0 32 12.5 0 0          0 1 2  .006490329 1 109  24.043016   4.903368 24.043016          .      .1711064
            177 0 36   25 0 0          0 1 2  .006490329 0 152  24.043016   4.903368 24.043016          .      .1711064
            905 1 66   25 0 0          0 4 2   .11393563 1 543   7.833237   2.798792  7.833237          .     .05676018
            691 0 70  7.5 0 0          0 4 1   .06198522 0 426  1.6761652   1.294668 1.6761652          .     .03901731
            919 1 43 12.5 0 0          0 1 2  .006490329 0 550  24.043016   4.903368 24.043016          .      .1711064
            652 0 44  7.5 0 0          0 1 1  .003345979 0 413 .023606766 -.15364493         . .023606766  .00010676098
            510 0 35  7.5 0 0          0 1 1  .003345979 0 355 .023606766 -.15364493         . .023606766  .00010676098
            680 0 34    0 1 0 .071428575 1 1  .003433954 0 421 .006900628 -.08307002         . .006900628  9.066683e-06
             47 0 32    0 1 0 .071428575 1 1  .003433954 1  45 .006900628 -.08307002         . .006900628  9.066683e-06
            577 0 59    0 1 0 .071428575 3 1   .04970923 0 381   .1053277  -.3245423         .   .1053277   .0007135076
              7 0 76  2.5 1 0 .071428575 4 1   .06351673 1   7   .2063257  -.4542309         .   .2063257   .0028914756
            610 0 50  2.5 1 0 .071428575 2 1   .02933348 1 395  .03029542  -.1740558         .  .03029542  .00007567182
            873 1 67 12.5 1 0 .071428575 4 2   .11659116 1 523   7.632371  2.7626746  7.632371          .     .05579597
            466 0 70  2.5 1 0 .071428575 4 1   .06351673 0 331   .2063257  -.4542309         .   .2063257   .0028914756
            348 0 57   25 1 0 .071428575 3 2   .09238433 1 267  .10236917  -.3199518         .  .10236917   .0005845565
            142 0 63    0 1 0 .071428575 3 1   .04970923 1 123   .1053277  -.3245423         .   .1053277   .0007135076
            594 0 73  7.5 1 0 .071428575 4 1   .06351673 0 389   .2063257  -.4542309         .   .2063257   .0028914756
            119 0 34  7.5 2 0  .14285715 1 1 .0035242336 1 106  .01063144 -.10310888         .  .01063144 .000021389464
            110 0 61 17.5 2 0  .14285715 3 2   .09459117 1  99  .10507467  -.3241522         .  .10507467   .0006046857
            458 0 52    0 2 0  .14285715 2 1   .03008411 1 329  .06234941 -.24969864         .  .06234941   .0003165341
            639 0 30    0 2 0  .14285715 1 1 .0035242336 0 407  .01063144 -.10310888         .  .01063144 .000021389464
            770 0 28 12.5 2 0  .14285715 1 2  .006834947 1 456 .006890422 -.08300857         . .006890422  8.446876e-06
            475 0 52 17.5 2 0  .14285715 2 2   .05692032 1 336  .06061461 -.24620035         .  .06061461  .00025993306
            795 1 81 12.5 2 0  .14285715 4 2   .11930024 1 471   7.436655   2.727023  7.436655          .     .05484399
             42 0 67   35 2 0  .14285715 4 3   .26869997 1  40   .3781981  -.6149781         .   .3781981    .011085955
            624 0 46  2.5 2 0  .14285715 2 1   .03008411 0 398  .06234941 -.24969864         .  .06234941   .0003165341
            799 1 65  7.5 2 0  .14285715 4 1  .065083444 1 473  14.432213   3.798975 14.432213          .     .06763676
            442 0 36   45 2 0  .14285715 1 3  .018324837 1 323  .01873748 -.13688493         .  .01873748  .00007084302
            250 0 26    0 2 0  .14285715 1 1 .0035242336 0 206  .01063144 -.10310888         .  .01063144 .000021389464
            729 0 52   45 3 0   .2142857 2 3   .14385769 1 443  .17095223  -.4134637         .  .17095223   .0029729444
            752 0 39    0 3 0   .2142857 1 1  .003616878 0 451 .018212212 -.13495263         . .018212212   .0000623877
            731 0 42  2.5 3 0   .2142857 1 1  .003616878 0 443 .018212212 -.13495263         . .018212212   .0000623877
            225 0 59 17.5 3 0   .2142857 3 2   .09684511 1 188  .10785163  -.3284077         .  .10785163   .0006254476
             65 0 37  7.5 3 0   .2142857 1 1  .003616878 1  61 .018212212 -.13495263         . .018212212   .0000623877
             72 0 43  7.5 3 0   .2142857 1 1  .003616878 0  65 .018212212 -.13495263         . .018212212   .0000623877
            794 1 71 12.5 3 0   .2142857 4 2   .12206358 1 470   7.245958   2.691832  7.245958          .     .05390421
            227 0 69    0 3 0   .2142857 4 1   .06668605 1 190   .0717899 -.26793638         .   .0717899   .0003406843
            439 0 38  2.5 3 0   .2142857 1 1  .003616878 0 321 .018212212 -.13495263         . .018212212   .0000623877
            576 0 42  7.5 4 0   .2857143 1 1  .003711949 1 381 .003728376 -.06106043         . .003728376 2.5988686e-06
            605 0 76    0 4 0   .2857143 4 1   .06832524 0 393   .2232102  -.4724513         .   .2232102   .0032490236
            513 0 91  2.5 4 0   .2857143 4 1   .06832524 0 356   .2232102  -.4724513         .   .2232102   .0032490236
             13 0 60    0 4 0   .2857143 3 1    .0535322 1  13  .05675707 -.23823743         .  .05675707  .00019778043
            506 0 47 17.5 4 0   .2857143 2 2   .05978151 1 354  .06386142 -.25270817         .  .06386142   .0002800755
            203 0 74  7.5 4 0   .2857143 4 1   .06832524 1 170   .2232102  -.4724513         .   .2232102   .0032490236
            260 0 79    0 5 0   .3571429 4 1   .07000169 0 212  1.3301448   1.153319 1.3301448          .    .033003487
            303 0 87    0 5 0   .3571429 4 1   .07000169 0 239  1.3301448   1.153319 1.3301448          .    .033003487
             38 0 71    0 5 0   .3571429 4 1   .07000169 1  36  1.3301448   1.153319 1.3301448          .    .033003487
            421 0 81 12.5 5 0   .3571429 4 2    .1277557 1 314  .14757627  -.3841566         .  .14757627   .0011168587
            666 0 69    0 5 0   .3571429 4 1   .07000169 0 418  1.3301448   1.153319 1.3301448          .    .033003487
            205 0 49    0 5 0   .3571429 2 1   .03244932 1 171  .03362604 -.18337403         .  .03362604  .00008867987
            408 0 38   35 5 0   .3571429 1 3   .01978439 1 305   .0202648  -.1423545         .   .0202648  .00008141645
            330 0 48   25 5 0   .3571429 2 2   .06126212 1 257  .06554951  -.2560264         .  .06554951   .0002907061
            453 0 62  7.5 5 0   .3571429 3 1   .05486706 1 325  .05825665 -.24136415         .  .05825665  .00020515347
            684 0 42    0 5 0   .3571429 1 1 .0038095096 1 424 .003826797 -.06186111         . .003826797  2.721454e-06
            853 1 75  7.5 5 0   .3571429 4 1   .07000169 0 509  1.3301448   1.153319 1.3301448          .    .033003487
            660 0 27    0 6 0   .4285714 1 1 .0039096237 0 416 .007861338  -.0886642         . .007861338 .000011416222
            154 0 27    0 6 0   .4285714 1 1 .0039096237 1 134 .007861338  -.0886642         . .007861338 .000011416222
            131 0 48   25 6 0   .4285714 2 2  .062776946 1 117 .067282245  -.2593882         . .067282245   .0003017278
            233 0 41   45 6 0   .4285714 1 3   .02029577 1 195  .02080116 -.14422607         .  .02080116  .00008528385
            211 0 47    0 7 0         .5 2 1  .034124944 1 177  .03542634 -.18821886         .  .03542634       .000096
            699 0 64    0 7 0         .5 3 1   .05763135 1 432  .06137577 -.24774133         .  .06137577  .00022071248
            682 0 39  2.5 7 0         .5 1 1  .004012359 1 422 .004031505 -.06349413         . .004031505  2.984543e-06
            end
            label values case case
            label def case 0 "Healthy", modify
            label def case 1 "Esophageal Cancer", modify
            label values heavy_alcohol alcohol
            label def alcohol 0 "Light/Moderate", modify
            label values age_group age
            label def age 1 "44 or less", modify
            label def age 2 "45-54", modify
            label def age 3 "55-64", modify
            label def age 4 "65 or older", modify
            label values tobacco_group tob
            label def tob 1 "0-9 gm/d", modify
            label def tob 2 "10-29 gm/d", modify
            label def tob 3 "over 30 gm/d", modify

            Comment


            • #7
              Eve, it looks like both Scott and Nick's code will work for you if based off of gm_tot_alcohol. In your -dataex- sample, the top 20% of heaviest drinkers consume >=4 drinks per day.

              Code:
              summ gm_tot_alcohol, detail
              
                                     gm_tot_alcohol
              -------------------------------------------------------------
                    Percentiles      Smallest
               1%            0              0
               5%            0              0
              10%            0              0       Obs                 100
              25%            0              0       Sum of Wgt.         100
              
              50%            1                      Mean               1.85
                                      Largest       Std. Dev.      2.152635
              75%            3              6
              90%            5              7       Variance       4.633838
              95%            6              7       Skewness       .8363514
              99%            7              7       Kurtosis       2.402493
              
              * Determine the various percentile cut points
              centile gm_tot_alcohol, centile(10(10)90)
              
                                                                     -- Binom. Interp. --
                  Variable |       Obs  Percentile    Centile        [95% Conf. Interval]
              -------------+-------------------------------------------------------------
              gm_tot_alc~l |       100         10           0               0           0
                           |                   20           0               0           0
                           |                   30           0               0           0
                           |                   40           0               0           1
                           |                   50           1               0           2
                           |                   60           2               1           3
                           |                   70           3               2           4
                           |                   80           4               3           5
                           |                   90           5               5           6
              
              * Displaying the relevant stored value
              display r(c_8)
              4
              
              centile gm_tot_alcohol, centile(80)
              
                                                                     -- Binom. Interp. --
                  Variable |       Obs  Percentile    Centile        [95% Conf. Interval]
              -------------+-------------------------------------------------------------
              gm_tot_alc~l |       100         80           4               3           5
              
              * Displaying the relevant stored value
              display r(c_1)
              4

              Code:
              centile gm_tot_alcohol, centile(80)
              display r(c_1)  // this is just so you can see it
              
              gen heavy_drinker = 0
              replace heavy_drinker = gm_tot_alcohol if gm_tot_alcohol >=r(c_1) & gm_tot_alcohol !=.
              * Equiv to if gm_tot_alcohol >=4 

              Which is a longer version of Nick's code in #5 above
              Code:
              * condition() does (logical_test, [value_if_true], [value_if_false])  (It's like the IF() function in Excel if that's more familiar)
              * cond() also handles nonmissing, see help cond
              centile gm_tot_alcohol, centile(80)
              gen heavy_drinker = cond(gm_tot_alcohol >= r(c_1), gm_tot_alcohol, 0)

              Comment


              • #8
                Thank you so much David!!

                Comment

                Working...
                X